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Abstract — Filtering is one of the main operations in signal processing. The efficiency of the filter 
mainly depends on multiplier and adder. The modulo 2 n + 1 multiplier and adder are used to 
design modulo 2 n + 1 FIR filter architecture which is useful in applications like Residue Number 
System, Digital Signal Processing applications and cryptographic algorithms. In this multiplier 
[1], one operand uses weighted representation and another operand use diminished- 1 
representation. The new multiplier reduces the number of partial products which in turn reduces 
the operational time and power. Modulo 2 n + 1 adder [2] can produce modulo sums within the 
range (0, 2 n } y which is more than the range (0, 2 n — 1} produced by existing diminished- 1 
modulo 2 n + 1 adders. Since both units are designed effectively, the proposed FIR filter will be 
efficient. 

Keywords — FIR filter, dimnished-1 representation, residue number system, modular arithmetic, 
modular multiplier. 
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I. INTRODUCTION 

The RESIDUE number system (RNS) has been employed for efficient parallel carry- 
free arithmetic computations suitable for high speed DSP applications due to their inherited 
parallelism, modularity, fault tolerance and localized carry propagation properties .Since modulo 
computations can achieve significant speedup over the binary- system-based computation, 
they are widely used in DSP processors, FIR filters, and communication components. Some 
arithmetic operations, such as addition and multiplication, can be carried out more efficiently in 
RNS than in conventional two's complement systems. The modulo 2 n + 1 addition is the most 
crucial step among the commonly used moduli sets. 

There are many previously reported methods to speed up the modulo 2 n + 1 addition. Depending 
on the input/output data representations, these methods can be classified into two categories, 
namely, diminished- 1 and weighted respectively. 

In the diminished- 1 representation, each input and output operand is decreased by 1 
compared with its weighted representation. Therefore, only n-bit operands are needed in 
diminished- 1 modulo 2^+1 addition, leading to smaller and faster components. 

However, this incurs an overhead due to the translators from/to the binary weighted 
system. On the other hand, the weighted- 1 representation uses (n + l)-bit operands for 
computations, avoiding the overhead of translators, but requires larger area compared with the 

diminished- 1 representations. ^^^^^^^^^^^^^^TIB L 

Modulo multipliers can be divided into three categories, depending on the type of 

operands that they accept and output: W SmmJk 

I (BLJF mm m 

1) the result and both inputs use weighted representation; 

2) the result and both inputs use diminished- 1 representation; 

3) the result and one input use weighted representation, while the other input uses diminished- 1. 

For the first category, Zimmermann [10] used Booth encoding to realize, but depart from 
the diminished- 1 arithmetic, which leads to a complex architecture with large area and delay 
requirements. Sousa et al [11] modified the radix-4 Booth recoding in order to take advantage 
of the diminished- 1 arithmetic. The number of the partial products is reduced to (n/2+1) but the 
area for the partial products generator and the correction term generator are large, and the 
constant "2" has to be added for the final modular adder. Vergos [13] proposed new modulo 
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multipliers using non-Booth recoding. The number of the partial products is n+1 and the word- 
length of each partial product is n-bit wide. 

For the second category, Wang et al. [8] proposed diminished- 1 multipliers with n-bit 
input operands. The multipliers use a non-Booth recoding and a zero partial-product counting 
circuit. Handling of zero inputs and results was not considered. Sousa et al [11] proposed 
modulo multipliers for diminished- 1 representation with treatment of zero operands. The 
multipliers use a modified radix-4 Booth recoding and a Wallace tree addition [3]. The number 
of the partial products is approximately halved without counting in the correction term and the 
constant. The correction term generator is a complex combinational circuit. Furthermore, the 
modification of the radix-4 Booth recoding leads to complexities in the partial product generator. 
Efstathiou [12] designed a diminished- 1 multiplier by using non-Booth recoding. Treatment of 
zero operands or results was not discussed. The number of the partial products is that leads to a 
large overhead for area and delay. 

The third category [15] applies to some special applications, such as encryption algorithm 
and FIR filters. Due to one input using diminished- 1 representation, the new architecture can be 
based on n-bit additions and radix-4 Booth recoding scheme, which is efficient and regular. 

The coefficients of RNS FIR filters are constant, the diminished- 1 representations of the 
coefficients can be pre-computed during design process, and its conversion does not belong to 
the critical path. 

Improved weighted modulo 2^+1 adder design using diminished- 1 adders with simple 
correction schemes is achieved by subtracting the sum of two (n + l)-bit input numbers by the 
constant 2 n + 1 and producing carry and sum vectors. The modulo 2^+1 addition can then be 
performed using parallel-prefix structure diminished- 1 adders by taking in the sum and carry 
vectors plus the inverted end-around carry with simple correction schemes. The modulo 2 n + 1 
adder used do not require the hardware for zero detection that is needed in diminished- 1 modulo 
2^+1 addition. 

II.FIR FILTER 

One of the most widely used operations in DSP applications is FIR filter [4-6]. A variety of 
approaches to implement FIR filters have been pursued. Low-power architecture for linear phase 
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FIR including retimed structure, balanced modular architecture, separated signed processing data 
flow and modification of the CSD representations are there. 





■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■i 



0! 
! 




2® 

UnitC 



Fig.l. Linear phase FIR filter structure 



The 8-tap modulo 2 n +l FIR filter structure is constructed using modulo 2 n + 1 multipliers 
[1] and adders [2]. The proposed schemes not only address the linear-phase FIR filter, but also 
can improve the non linear-phase FIR filer. 



III. MODULO 2 +■ 1 MULTIPLIER ARCHITECTURE 

Multiplication can be done more efficiently in RNS [7-9] than in conventional two's 
complement systems. Efficient schemes for modulo multipliers have been studied intensively 
modulo 2™ + 1 multiplier is proposed and it uses non booth recoding. r JH 

Modulo 2- + 1 multiplier L ^^^^^^^^^^^^^f^m I ■ 

The modulo 2" + 1 arithmetic operations require (n-hl)-bit operands. To avoid (n+l)-bit circuits, 

the diminished- 1 number system [19] has been adopted, d [A] be the diminished- 1 representation 

of the normal binary number, namely "P m 

A €[0,2"], that i s 

d [A] =IA ~ H 3 *+i (1) 
When A^ 0, d [A] € [0,2" -1], is an n-bit number, therefore (n+l)-bit circuits avoided in this 
case. In new modulo 2" + 1 multiplication, the result and one input use weighted 
representations, while the other input uses diminished- 1 representation. 

Let [A]= J(fl: n a„-iO n -2 o ) 

be the diminished- 1 representation of the weighted binary number, and 
A,B=SC*In i, n-i i n-i & a) and the output 



A Monthly Double-Blind Peer Reviewed Refereed Open A ccess Interna tional e-Journal - Included in the International Serial Directories 
Indexed & Listed at: Ulrich's Periodicals Directory ©, U.S.A., gJJ PE$M»M as well as in Cabell's Directories of Publishing Opportunities, U.S.A. 



International Journal of Management, IT and Engineering 

http://www.ijmra.us 



P=l^ x B\ 2 -- 1 = j= _', P_ 7l P„- 1 P„- 2 ^q) all be weighted binary numbers. Although one 

operand using diminished- 1 representation, the new modulo 2" + 1 multipliers avoid 
conversion circuits between weighted and diminished- 1 representation for some special 
applications, such as encryption algorithm and FIR filters. 

Modulo 2" + 1 multiplier architecture 

In accordant with the radix-4 Booth recoding [14], the partial product generator (PPG) 
can be constructed with the Booth encoder (BE) and Booth selector (BS). For BE block and BS 
block many implementations [15-18] were published, but they can be reduced to two categories: 
one having 4-bit bus and the other having 3 -bit bus [20]. The proposed multiplier [1] uses a 3 -bit 
bus approach. The BE block examines successive overlapping triplets ^n+i^n^n-i and encodes 
for each as an element of the set {-2,-1, 0, 1,2}. Each BE block produces 3 bits: lx, 2x and Sign. 
The 3 bits along with the multiplicand are used to form partial products. The BS blocks produced 
the partial products. Each BS block takes as inputs two successive bits of d [A]. There are two 
types of BS blocks in the proposed multipliers, E5 + and^ , since the inverted multi-bit left- 
circular shifts of d [A] and d [-A] are different. 

For the i-th partial product, 0- 2 ^ k , there are 2i BS" and (n-2i) BS + blocks Fig. 3 
presents the BE block and its truth table. Fig. 4 presents the BE t block used by the new 
multipliers. Fig. 5 presents and blocks an d their truth tables. The CTG produces C 
which has the form f A®xli+i®Xi -*"a) MmmmM 

TABLE-I 
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Truth table for BE block 
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The 2i-th bit v i is 1 when the BE t block encodes 0, otherwise *i is 0, one XNOR gate 

accepting the lx and 2x bits of the BE* block can generate the 2i-th bit . 

Sign 





Fig.2. Block diagram of Booth encoder 



(a) for n Even 




(c) 



(d) 



(b) BE Q for n odd 
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(c) BE M for n even (d) BE t l<i<k for n even and 

WT 0<i<k for n odd 

Fig.3. Logic diagram for ■Lblocks L 1 J1 



The inverted EAC CSA tree reduces operands | 

to two numbers. The CSA tree is usually constructed with full adders (FA). But in our 
multipliers, one CSA stage which takes the term can be further simplified, 
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Fig.4. Truth table and B5+ anc [ B5 
blocks 
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with x i 8 {0 1}, every other full adder in this stage can be simplified as a half adder (HA). This 
CSA stage can be constructed by using n/2 HAs and n/2 FAs for even, and (n+l)/2 FAs and (n- 
l)/2 HAs for odd. The final adder is a diminished- 1 modulo 2" + 1 adder. It is known that, the 
diminished- 1 modulo 2" + 1 adder outperforms the normal binary modulo 2" + 1 adder in 
delay and area [21]. In this work, the fastest diminished- 1 modulo 2" + 1 adder proposed in 
[22] is adopted. 



IV. MODULO 2" + 1 ADDER 
ARCHITECTURE 

Instead of subtracting the sum of A and B by D, which is not a constant as proposed in 
[17], we use the constant value - (2 n + 1) to be added by the sum of A and B. In addition, we 
make the two inputs A and B to be in the range {0, 2 n J, which is 1 more than {0, 2 n - I J as 
proposed in [18]. In the following, we present the designs of our proposed weighted modulo 2 n + 
1 adder. 

Given two (n + l)-bit inputs A - a n a n -\, b n b n -\, . . . , bo, where < A, B < 2 n . The 
weighted modulo 2™ + 1 of A + # can be represented as follows: 

A + 5-(2 n +l),if (A + B)>2 n ^^E^Hf 
A + B, otherwise 



A + B 



2 n +\ 



This can be expressed as 



A + B 



2 n +\ 



A + 5-(2 n +l) ,if (A + B)>2 n 



2" 



2" 



A + 5-(2 n +1) +1, otherwise 



It can easily be seen that the value of the 



2" 



weighted | ^Lip Ifl I ^ Hff ^| 

modulo 2" + 1 addition can be obtained by first subtracting the value of the sum of A and B by 



2" +1 



(i.e., 0111, . . . , 1) and then using the diminished- 1 adder to get the final modulo sum by 



making the inverted end-around carry as the carry-in. 



A Monthly Double-Blind Peer Reviewed Refereed Open A ccess Interna tional e-Journal - Included in the International Serial Directories 
Indexed & Listed at: Ulrich's Periodicals Directory ©, U.S.A., MtTJiWlcffl?^ as well as in Cabell's Directories of Publishing Opportunities, U.S.A. 

International Journal of Management, IT and Engineering 

http://www.ijmra.us 




ISSN: 2249-0558 





a 6 



PP, 



16-bit inverted 
EAC CSA 



16 




Correction Term 
Generator 



C PR 



6-bit inverted 
EAC CSA 



PP. 



16-bit inverted 
EAC CSA 



16-bit inverted 
EAC CSA 



16-bit inverted 
EAC CSA 



16-bit inverted 
EAC CSA 



16-bit inverted 
EAC CSA 




Inverted EAC 
CSA Tree 



Diminished- 1 Modulo (2 16 +1) Adder 
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P = \AxB 

(In weighted representation) 



j 



Fig.5. Architecture of proposed modulo 2+1 multiplier 

The method of weighted modulo 2™ + 1 addition of A and B as follows. 
Denoting Y _ and U_ as the carry and sum vectors of the summation of A,B and 

-(2" + 1 ), where K'= y\_ 2 y\_ 3 y' y\_ x and £/ f = u\_ x u\_ 2 u\ the modulo addition can 

be expressed as follows: 
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A + B-(2 n +1) 



2" 



=1 § (2' * (fl { + ^ )) + 2- 1 * {2a n + 2b n +a n _ l +b n _ l ) + 1 1... 1 1 l 2 n 

i=0 

=1 § (2' * (a. + fc. )) + 2- 1 * (2a„ + 2b n +a n _ l +b n _ l + 1) l 2 n 

(=0 

^ • , *(2a„ + 2fc„+a„ ,+b , +1) n 

For i = to n - 2, the values of can be expressed as y\ and u\ can be expressed as y\ = ai v 
bi and u\ - ai @>u respectively (v is denoted as logic OR operation). Since the bit widths of Y' 
and U' are only n bits, the values of y r n _ x and u\_ x are required to be computed taking the values 

of a n , b n , a n -h and b n .j into consideration. It should be noted that < A, B < 2 n , which means a n 
= a n -2 = 1 or b n = b n .j = 1 will cause the value of A or B to exceed the range of {0, 2 n J. Thus, 
these input combinations are not allowed and can be viewed as don't care conditions, which can 
help us simplify the circuits for generating y'n_i and u 5 n _ x . The maximum value of 2a n + 2b n + 

a n .j +b n .j + 1 is 5,which occurs at a n = b n = l.The truth table for generating y\_ x and u\_ x 

is given in the table. ^^^^^ 



TABLE II 
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Truth Table for generating y' n _i > u' 

Two examples for our proposed addition methods are given as follows. 
Example 1: Suppose n = 4,A = 1610 = 100002, and B =1510 = 01 1 1 12, respectively. 
Step 1) (A + B) - (2 n + 1) => Y' = 1 1 102, 
U' =00002,F/X= 1. 

Step 2) r + U'= 11102, Cout = 0, => Y' + C/'+Cout V FIX = 11102 =\16 + 15117 = 1410. 

Example 2: Suppose n = 4, A = 1110 = 010112, and B =510 = 001012, respectively. 

Step 1) (A + B) - (2 n + 1) => Y' = 1 1 102, U' = 00012, FIX = 0. 

Step 2) Y'+U'= 11112, Cout = 0, => I^^^^^^^Ml r 1 

cout VFIX= 100002 = 111 + 5117 = 1610. jE*4 

The architecture for our proposed adder is given in Fig. 6. I 

From Fig. 6, the signal of FIX can be computedl^arallel ' ^ ™ 

with the translation to Y' + U' leading to efficient correction. 

In addition, the hardware cost for our correction scheme and FAF are less than the one proposed 
in [21], due to the fact that there are two inconstant numbers that should be processed in the 
translation stage. 
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V RESULTS 



The results of modulo 2^+1 adder, modulo 2" + 1 multiplier and modulo FIR filter are 
obtained by simulation and synthesis using modelsim and Xilinx ISE. 



Module 


No of 
Slices 


Delay 
(ns) 


Modulo 2 n+1 Adder 


20 


23.018 


Modulo 2 n+1 
Multiplier 


83 


30.903 


Modulo 2 n+1 FIR 
Filter 


629 


143.294 




Fig. 6. (a) Architecture of our proposed weighted modulo 2n+l adder with the correction 
scheme. 
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(b) Architecture of the translator —(2n + 1). 

(c) Architecture of the correction scheme. 

(d) Architecture of FAF and FA+,respectively. 



VI CONCLUSION 



The efficiency of the filter mainly depends on multiplier and adder. In this paper, modulo 
2" + 1 fir filter architecture is built by using modulo 2" + 1 adder and modulo 2 ?: + 1 
multiplier. The multiplier uses weighted representation and diminished- 1 representation. The 
multiplier reduces the number of partial products modulo 2 n + 1 adder can produce modulo sums 
within the range {0, 2 n }, which is more than the range {0, 2 n - 1} produced by existing 
diminished- 1 modulo 2 n + 1 adders. 

It can be used in any Digital Signal Processing applications, Residue Number System, 
cryptographic algorithms,etc 
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